Explaining Distilbert for Sequence Classifcation

Loading and Testing the Model

Here we load the tokenizer and a trained model on imdb and an untrained DistilBERT model.

Attempt 1: Explaining the Structure of the Model with Attention over the Layers

In this attempt we visualize the attention of the models and the difference in the attention of the pretrained and untrained model to see what attention got stronger by training the model on the imdb dataset.

First we define two sentences, one negative and one positive example, and see what the models would predict.

Visualize attention for each model for each sentence

Now we will visualize the attention for each model for each sentence to see how the attention looks like.

Untrained model with positive text

Untrained model with negative text

Trained model with positive text

Trained model with negative text

Visualize attention difference between models for each sentence

Now we want to see what the differences are between both attention matrices and will visualize it for each sentence. Important is that it will only show positive attention, that means, that attention that got less than in the untrained model will not be visualized, only attention that got more.

Positive text

Negative text